Picture for Xuanjing Huang

Xuanjing Huang

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Add code
Feb 05, 2026
Viaarxiv icon

DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training

Add code
Feb 05, 2026
Viaarxiv icon

Steering LLMs via Scalable Interactive Oversight

Add code
Feb 04, 2026
Viaarxiv icon

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models

Add code
Feb 04, 2026
Viaarxiv icon

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Add code
Feb 03, 2026
Viaarxiv icon

CL-bench: A Benchmark for Context Learning

Add code
Feb 03, 2026
Viaarxiv icon

CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs

Add code
Jan 31, 2026
Viaarxiv icon

BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

Add code
Jan 30, 2026
Viaarxiv icon

AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts

Add code
Jan 29, 2026
Viaarxiv icon

ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing

Add code
Jan 29, 2026
Viaarxiv icon